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1 Sigma algebras and sets 


1.1 Set theory 


Definition 1.1. We define the limit infimum and limit supremum of a sequence of sets by 
liminf A, = LJ () Am 
pees n>1lm>n 
= {x : x is in all but finitely many of the A,,’s} 
lim sup A, = () U Am 
Merce n>1lm>n 


= {x : x is in infinitely many of the A,’s} (P of events A, i.0.). 


Remark 1.2. Note that 
P (iim sup An) =) 


noo 


implies that A, is true for only finitely many n € N. In particular, this is useful, for example, if 
A, := {X > n} to show that X cannot be arbitrarily large. 


Observe that for any N > 1: 
lim sup A, C U Aj. 


The following properties are also apparent: 


Cc, 
’ 


e liminfp 0 An = (limsup,_,,, AS) 
e lim inf(A,) C limsup(A,); 

° {Tim sup, Auts %n } € o(Xm, Xm4i---)3 
e P(limsup,,_,,, An) = P of event ivo.. 


Theorem 1.3 (Monotonicity of Sequences of Sets and Continuity). We say that A, increases to 
A, written A, Z A, if An C Ans for alln > 1 and A= Un>1An. We say that A, decreases to A, 
written An \, A, tf An D An4i for alln > 1 and A=MnpsiAn. If An \ A, then by continuity we 
get that 

lim A, = { ) An. 


noo 
wel 


For example, if for some sequence of sets {Ex}x>1, we define PF), := UmsnEm, then we have that 
Fe Sey saln: 


Theorem 1.4 (Continuity of the Lebesgue measure on monotone sets from Real Analysis). Let 
{E;.} be a sequence of measurable sets in R”. 














(2) If Ex \ E and |Ex| < co for some k, then limp+oo | Ex| = |E}. 


Proof of 1. If for some j > 1, |E;| = 00 then |E,| = oo for all k > j, and hence |E| = oo. Now we 
can assume that |Ex| < oo for all k > 1. We write 


E = BE, U (Ey\ B,)U-++ = EU | (Beas \ Ex). 


k>1 


Then 











IE] = [Ei] + > [Been \ Bl = [Ei] + lim S7((Ei4| — [Eel) = lim [En 


k>1 k=1 





Proof of 2. We can assume that |E,| < co since otherwise we can truncate the sequence and relabel 
to obtain the same limit. Moreover, we can write FE, as a disjoint union of measurable sets: 


Ey = EU | J(E \ Bes). 


k>1 
Now we have that 


|F;| 


I 
& 


TF ye, |Ex \ Fe+t| 


k>1 
=|E| + lim S7(\Bx| — [Busal) 
k=1 


= E + lim (|| —— |En+1|), 
n—- Ooo 




















which implies the result. 

















Proposition 1.5. For X,,Y;, random variables and any a € R, ¢ > 0, we have that 


limsup{Y, >a+e}c {limsupY, >a} c {limsup Y,, > a—e} 
limsup{Y, < a—e}C {limsupY, < a} c {limsupY, < a+¢}, 


and that 


limsup{X, >a+e}C {lim sup X;, > af C limsup{X, > a—e} 


noo N+ Co Nn—-oco 


lim sup{X, <a-—e}C {lim sup X;, < al C limsup{X, <a+te}. 


n—-Co Noo n—-oco 


1.2 Sigma algebras and sigma fields 


We adopt the conventions that 


(Q. = sample space = set of all possible outcomes, 


F = the collection of all events (subsets of the sample space) C p(Q) 
T =()o(FaiFnits-.-5) (the tail field), 


n>1 
where p(Q2) denotes the power set of Q, or the set of all subsets of Q. 
Definition 1.6 (Algebras and Sigma Fields). The set (of sets) F is a algebra (or field) if 


(1) QEF; 


GABE? = AUBE SF, 
(3) AEF => Ace fF. 


Similarly, the set F is a o-algebra (or o-field) if 


(1) QEF; 
(2) A, €F = sa ece: 
(3) AEF => Ave fF. 


Examples of o-algebras include the following: F, = {2,0} and Fo = p(Q), the power set of 2. We 
have the following consequences of these definitions: 


eA BEF = ANBEF since (ANB) =A'UB; 
eA BEF = A\BeEFf, since A\ B= AN Bo = (ASUBY; 
e fields C o-fields; 
e For F a o-field, An € F = > Mns1An € F. 
Definition 1.7 (\-Systems and z-Systems). A set G C p(Q) is called a A-system if 


(1) QEG; 

(2) An € G (disjoint) => U,s,An € G; 

(3) AABEG, ACB = B\AEG. 
If C is a \-system and in addition, C is closed under intersections (i.e., A, BEC => ANBEC), 
then C is called a 1-system. 


Definition 1.8 (Notation). Let C be a collection of subsets in 2. Then o(C) is defined to be the 
smallest o-algebra containing C: 


aC Ul 
CCF 
F a o-algebra 


Similarly, we define \(C) to be the smallest A-system containing C’. 
Theorem 1.9 (Dyukin). Jf C is a 1-system, then (C) = o(C). 











Example 1.10 (The Borel Field of R). Consider the following sets: 
































Then the Borel field (IR) = o(C,) = --- = 0(C¢). It happens that all of the C;’s in this example 
are also examples of 7-systems. 











Example 1.11 (Spring 2019, #6). Assume (2,7,P) is a probability space such that there exist 
X 1, X2 : Q > R two independent Bernoulli random variables such that P(X; = 0) = P(X; = 1) = 
1/2. Show that must have at least 4 elements. Give an example with 0 having 4 elements 
together with a sigma alpgebra such that on it we can define two independent Bernoulli as above. 
Can you generalize this? 














2 Probabilities 


2.1 Definitions and basic properties 


Throughout we will use the notation that (2 denotes the sample space and that F denotes the o-field 
of events. Given a fixed probability P, we will make reference to the probability space (w, F,P). 


Definition 2.1. The function P : F — [0,1] is a probability if 


e P(Q) = 1; and 
e (A,)n>i1 C F (disjoint) = > P[Un>1A,] = Yin>1 P(An)- 


The next properties are apparent from the definition of P as a probability function: 


e P(S) =0; 





° P(A) =1—P(A), VAE F; 
© ACB = P(B\ A) =P(B) —P(A), since P(B) = P(A) + P(B \ A). 




















Theorem 2.2 (Uniqueness on R). Jf P,Q are two probability measures on R, and if P(—co, 2] = 
Q(—oo, x] for all x ER, then P(A) = Q(A) for all A € B(R). 
































Definition 2.3 (Continuity and Additivity). A probability P is continuous if A, \ @ => 
P(A,,) > 0 as n— oo. The function P possessing finite additivity is equivalent to the property that 


whenever (A,)n>1 are disjoint: 
P (U 4) =) P(Ai). 
j= i=l 


On the other hand, o-additivity is equivalent to 


P (U 4) => P(A), 
i=1 i=1 
provided that the A, are disjoint: A; A; = @ for all 7 4 7. 


Proposition 2.4. If P is finite additive and continuous, then it 1s a-additive. 


Example 2.5 (Properties of Probabilities). We have the following additional special properties of 
probability function P: 


e Subadditivity: For any (A,,) C F, P (Us An) < dine P(An); 
e Fatou: If (A,) C F is arbitrary, then 


P(lim inf A,,) < liminf P(A,,) < limsup P(A,,) < P(limsup A,). 


























(Compare to Fatou for expectation: If X,, > 0, then Eflim inf X,] < lim inf ELX,].) 


e Probabilities on countable sets: If Q = {w1,w2,...} is countable, then for any A C 2: 
P(A) = cea P(w;) and E[A] = poner wi + P(wi). 














2.2 Distribution functions (on R and R”) 





Definition 2.6 (Distribution Functions). If yw is a probability on S(R), then the distribution func- 
tion of wis F(a) := u((—oo, 2]), or equivalently, | Fx(t) = PLX < t] for allt € R}|. We have the 
following properties of the distribution function: 


























(3) F, is non-decreasing; 


(4) F,, is right continuous. 


In particular, any function satisfying all of these four properties can be taken (shown) to be a 
distribution function for some random variable. 














Theorem 2.7. If u,v are probability measures on R such that F(x) = F,(x) for all x € R, then 
u(A) = v(A) for all A € B(R). 


Theorem 2.8. Let (Q,F) be a probability space with F = o(C) for C a -system. If P,Q are two 
probabilities on F such that P(C) = Q(C) for all C €C, then P(A) = Q(A) for all AEF. 






































Example 2.9 (The Dirac Distribution at a Point). For a fixed a € R, we define 





ft, aé A; 




















As we can check 6, is a probability measure on R since 6,(R) = 1 and (A,) disjoint ==> 
da (Un>1An) = dons19a(An). The distribution function of da, /5,, has the following plot shape 


for arbitrary a € R: 



































Example 2.10 (The Lebesgue Measure on the Unit Interval). For an interval [a,b] C [0,1], we 
define the Lebesgue measure by the function X([a, b]) = b—a for 0 << a<b< 1 where X((0, 1]°) = 0. 
Then by integration, we can see that the corresponding distribution function satisfies: 


0, «x <0; 
Lie) =< ¢, Vag ale 
1, «21. 


This distribution function has the following plot: 


8 





0 1 











Example 2.11 (Construction of y, F,, from a Function). Provided that the function F’ satisfies 
the four required distrubution function properties from the definition above, we can construct a 
probability measure ps corresponding to this F' using the next steps: 


e Set p((—00, z]) := F(a). Then p((a, b]) = ((—00, B] \ (—o0, a]) = F(b) — F(a) for a < 6. 


e To handle the closed intervals, take: 


p((a,8)) = lim 4((a— =, 6)) = lim (F(b) ~ F(a 1/n)) = FO) — Flo"). 


N—->Co n—->Cco 


e We can apply the Caratheodory criterion to see that if u* is an outer measure, then 











A:={ACR:p*(B) = "(BN A) +p*(B\ A), VB B(R)}, 








is a o-field. 
e Finally, we can define p := p* : 
y. Ms= a(R) 
Definition 2.12 (Probabilities on R”). Here, for example, we will select n := 2 to illustrate the 
key properties. If jz is a probability on R?, then its distribution function F,,(x,y) := p((—0oo, x] x 
(—oo, y]) satisfies: 


























(1) Hj 9 286 Fats y) = 0; 
(2) lime-ys0 pe A oe) al 
(3) F), is non-decreasing in each variable; 


(4) F,, is right continuous in each variable. 


Example 2.13 (Spring 2018, #7). Let F,,, F be distribution functions such that F;, + F' weakly. 
If F is continuous, show that 
sup |F,,(2) — F(x)| — 0. 


Example 2.14 (Fall 2018, #5). For distribution functions FG on the real line, define 











L(F,G) := inf {e >0:VteR, F(t) < G(it+e)+e¢,G(t) < F(it+e)+e}. 





It is known that L is a metric. Prove that L(F,,F) — 0 as n > oo if and only if F;, converges 
weakly to F’. 


2.3. Integration review and properties of distribution functions 


The mean value theorem (MVT for derivatives) states that if f is defined and continous on |a, }| 
and differentiable on (a,b), then dc € (a,b) such that (b — a) f’(c) = f(b) — f(a). The intermediate 
value theorem (IVT) states that if f is continuous on [a, 6], then for all c in the range between f(a) 
and f(b), da € (a,b) such that f(a) = c. The MVT for integrals states that if f is continuous on 
[a, b] then de € (a,b) such that 














flo)= = f reat. 


Notice that f monotone increasing on [a,b] = > f ©€ BV{a,)b], ie., f has bounded variation 
on [a,b]. If f has bounded variation on [a,b], then we can express f = g — h where g,h are both 
bounded and monotone increasing on |a,b]. This result is known as the Jordan decomposition 
theorem. Note that these functions can be extended to be monotone increasing on all of R as well. 
Also, f monotone increasing implies that f is differentiable a.e., f’ > 0, and if f'(t)dt < f(b) — fla) 
for all a < b. 














Proposition 2.15 (Monotone increasing function properties). Jf f is monotone increasing 
(especially including distribution functions), then 





(a) f has at most countably many points of discontinuity. 

(b) f is differentiable a.e. 

(c) f' is a measurable function. 

(d) f' € L* and we have the FTC as an upper bound: 0 < ti f' < f(b) — f(a). 
Nested characterizations: We say that f is Lipschitz on |a, b] if there exists a K > 0 (where k 
is independent of all x, y € [a, b]) such that | f(x) — f(y)| < K - |x — y| for all x,y € [a, b]. We have 


the next important nested inclusion of function types. Here, C' denotes the set of functions with 
one continuous derivative: 


Cla, b] & Lipfa, 6] € AC[a, b] © BV[a,b] € L™ [a,b]. 


Typical counter examples to these types of functions are \/z or are of the form f(x) = x*sin(1/z°). 


2.4 The product construction 


Definition 2.16 (Products). Suppose that (Q1, 71, P,) and (Q1, F2, P2) are two probability spaces. 
Then the product construction between these two spaces corresponds to taking Q := Qy x Qe, 
F :=F,@Fo:=0({Ax B: A€ Fi, B € Fo}), and the tensor of the probabilities as P := P, ® Py, 
where P(A x B) = P,(A) - P(B). 


Example 2.17 (Spring 2019, #7). If X,Y are two random variables such that X > Y and X,Y 
have the same distribution, then X = Y almost surely. 


Solution due to Michael Lacey. Assume that X < Y almost surely. For all y, 
P(Y <y)=P(X<Y <y)<P(X<y). 


If X ZY as., then Jy so that 





P(Y Sy) < P(x <y) 
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By upper continuity, Sy’ > y so that 





P(Y <y!) <P(X <y). 


Thus P(X < y,Y > y’) > 0, a contradiction. 
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3 Independence 


3.1 Basics and preliminaries 


For our probability space (Q, ¥,P), in general for any A, B € F, the conditional probability of A 
given B has occurred is 
P[AN B 

PB) © 
As a notion of independence of the random variables A and B, we want that observing B does not 
affect the probability of A, i.e., that P[A|B] = P[A] provided that P[B] 4 0. This is equivalent to 
the condition that P[AM B] = P[A] - P[B], which holds even when P[B] = 0. Thus we take this 
property as our definition of independence: |P|ANM B| = P[A]- P[B]|. 


P[A|B] = 














Example 3.1. First, if A := 2,0, then A, B are always independent for any B € F. We have that 
if A, B are independent, then A°, B are independent. We can prove this fact using the following 
logic: 
P(A° Nn B) = P(B \ A) = P(B) —P(ANB), by independence, 

= P(B) — P(A)P(B) = (1 — P(A))P(B) 

= P(A°)P(B). 
So to summarize, any of the elements of {@,9,A,A°} are independent of any in {9,0, B, BC} 
whenever A, B are independent. 


We generalize to the independence of a set of random variables by defining that A,, A2,..., Ap are 
independent if V1 <i, <ig< +++ <ip <n (with2<k<n) => 

Note that this definition implies pairwise independence, but is actually a much stronger requirement. 
For example, A, B,C are independent if each of P(AN B) = P(A)P(B), P(ANC) =P(A)P (8S), 
P(BNC) = P(B)P(C), and P(AN BNC) = P(A)P(B)P(C). In general, for J some indexing 
set, we make the next definition: (F;);er are independent if Wi1,...,in € I (distinct) with A;, € Fj, 
we have 


j=l 


Theorem 3.2 (Generation Theorem). For any i € I, assume that F; € o(C;) for C; always a 
m-system. Then (F;i)ier are independent if and only if Viy,...,%, € I (distinct), and with Aj, € Ci,, 


Corollary 3.3 (Grouping Theorem). [f (F;)ier are independent, and if {1;}j<es partition I as I = 
Ujesl;, and if 0; := 0({F; : 1 € I;}), then the (0;)je7 are independent. 














Example 3.4 (Independent Groupings). Suppose that the sequence of (A;,)n>1 are independent. 
Then for 

By, := A, U A3U As UA7U-:: 

By := (ApN AgM Ag) U (Ag M Ato N Aig) U--: 

Cy := AyU AgU Az U Ayo U-:- 

Cz := (A2M As) U (Ag \ Ati) U--: 

C3 := (Az \ Ag) U (Ag \ Ariz) U--- , 


we have that B,, By and C), C2, C3 are each independent from the “grouping” corollary above. 
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Example 3.5 (A Set Independent of Itself). Observe that A is independent of itself (henceforth, a.s. 
constant per exam 2), then P(A) € {0,1}. This happens because of our independence requirement 
that P(A) =P(AnN A) =P(A)? = P(A) € {0,1}. 


Example 3.6 (A Three-Variable Proof). If A, B,C are independent, then we can prove also that 
A°, B,C are independent: 


P (ASN BNC) =P((BNC)\ A) =P(BNC) -—P(BNCNA) 
P(B)P(C) —P(B)P(A)P(C) = (1— P(A))P(B) P(C) 
P (A°) P(B) P(C). 


Example 3.7 (Fall 2016, #6). Let {A,} be an infinite collection of independent events. Suppose 
that P(A,) < 1 for every n > 1. Show that P(A,,1.0.) = 1 if and only if P(UA,) = 1. 


3.2 Key examples and results 


Theorem 3.8 (Kolmogorov’s 0 — 1 Law). Let (Fn)n>i be an infinite sequence of o-fields. The 
corresponding, so-called tail field is 


D0) 0 Fatatipess)s 
n>1 


where the sets (Fn, Fn+i,..-) are decreasing with n. If the (Fn)n>1 are independent, then P(A) € 
{0,1} for all Ac T. As a consequence of this phenomenon, we observe that if (An)n>1 are inde- 
pendent, then 


(1) P(limsup,,_,,, An) € {0,1}; and 





(2) P(liminf, +. An) € {0,1}. 


Theorem 3.9 (Borel-Cantelli Lemma (Important, Used Frequently on the Exam)). If (An)ns1 
is such that )°,.,P(An) < oo, then importantly, P (limsup,_,.. An) = 0. This observation is 
later employed to guarantee almost sure convergence of the A, to zero provided that the latter sum 
converges. Secondly, if (An)n>1 are independent, then 


S "PP (An) = oo. =>. P (iimsup An) =i 


nel noo 





Proof Note: We can utilize the trick that for any set A € F, P(A°) =1— P(A) < e “), which 
is essentially the same as saying that 1 — x < e~* whenever x € [0,1]. Another useful inequality is 
that if |b] <1, then |e? — (1+ )| < |d/?. 











Example 3.10 (Spring 2019, #2). If (Xn)n>1 is a sequence of random variables, then there exists 
a sequence (Cp)n>1 With c, — oo such that 


tone, a 
P (tim “= =0) = 1. 
NCO Cp, 


(Hint: Think an easy application of the Borel-Cantelli lemma.) 


Example 3.11 (Spring 2018, #8). Let (X;,)n>1 be an iid sequence of random variables. Show that 
E|X?] < oo if and only if for every c > 0, P (|X| > c/n infinitely often) = 0. 
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The next example gives an interpretation of the condition that limsup, ,,, An = 0 in terms of 
probabilities. This technique showed up 2-3 times on the Fall 2019 comprehensive exam. 


Example 3.12 (Fall 2019, #2). Assume (X,,)n>1 are iid random variables on some space (2, F, P) 
with common Gumbel cumulative function given by 


—x 


F(a)=e° ,ceER. 














Show that 
lim sup (—X, — log log(n)) = 0. 


noo 


Sketch of the general technique. We have to prove two things: 





1) P(limsup,,,,,(— loglogn — X,) > 6) = 0, V6 > 0; 











2) P(limsup,_,,.(— log logn — X,) > —d) =1, Vd > 0. 











To do this we can utilize the Borel-Cantelli lemma. 





Example 3.13 (Preview of Independent Bernoulli Random Variables). For n > 1, let X, ~ 
Bernoulli(p,) so that P(X, = 1) = pp and P(X, = 0) = 1— pp. Then we claim that X, “> 0 
=> Si Dn < oo. This is true since if the right-hand-side sum converges then 


So P(Xn = 1) = 5 P(Xn — 0 > €) < 00, 


n>1 n>1 


so that by Borel-Cantelli lim sup,,_,,, P(|Xn| > ¢) = 0. 

















Example 3.14. If X1,...,X, are independent, then for @),...,¢@, : R — R measurable, the 
random variables Y; := ¢;(X;) are still independent. Why is this the case? 





{Y; € Aj} = {:(X;) € Ai} = {X; € 67 (ADI, 


where the ¢; '(A;)’s are measurable. Hence, the Y;’s are independent as well. 
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4 Random variables 


4.1 Definitions and basic properties 











Definition 4.1. A function X : 2 — R defined on the sample space is a random variable if 
X~'(A) € F for all A € G(R). Here, we define X~!(A) := {w : X(w) € A} = {X € A} where the 
second set notation is used more “loosely” as a simplification to use as a nice tool for understanding. 
We immediately get the following properties which are apparent given our nice new notation for 
expressing the meaning of ar.v. X: 

















a (Vier Ai) = Vier X~1(Aj); 
(2) {X E Mier Ai} =Mier{X € Ai}; and 
(3) {X € A&} = {X e€ A}®, or in other words, X~1(A°%) = X71(A)*°. 


For any fixed random variable X, we get a corresponding probability measure j1x defined by 





ux(B) =P(X € B) =P(X7'(B)), VBE G(R). 











Generalizations: We can extend this notion more generally by setting (Q,F) > (S,¢), with 
F,¢ both o-fields. In particular, a S-valued random variable is a function X :Q — S such that 
{X © A} € F for all A € ¢. The next couple of results fill in some key details for extended random 
variables of this type. 


Proposition 4.2. If X :Q — S is a random variable, and ¢ a o-field, then Fx = X~\(s) = 
{X—!(A) : A € c} is a o-field. Moreover, if X :Q > S, with s = o(C'), then o(X71(C')) = 
X75) = X7*(9(C’)). 


Corollary 4.3 (Characterization for the R Case). If S = o(C’), then X :Q—- S is a random 
variable if and only if X~!(A) € F for all A € C'. In the real-field case, X :Q — R is a random 
variable if and only if Vx € R, the sets {X <a} EF. 






































Proposition 4.4 (Useful General Facts for the Real Case). We have the next several observations 
which are collected together here as a matter of reference and completeness: 





(1) If X1,...,Xn are random variables on Q, then the tuple X = (Xj,..., Xn): Q—4 R” is a 
R"-valued random variable; 

































































(2) If g : R" > R is a measurable function, i.e., if g-1(A) € B(R") for all A € B(R), then 
g(X1,...,Xn) is a random variable when the X;’s are; 

















(3) Ifg:R" > R is continuous, then g is measurable; 

















(4) Examples of measurable functions on R? include: g(x,y) = L+y,0—y, cy, = (y #0), and 
g(@,Y) =X4(2)X p(y) for A, B € B(R) are all measurable, let’s say where they are defined. 














Example 4.5 (Limits). If {Xn}n>1 is any sequence of random variables, then 
Y := inf X,,sup Xn, liminf X,, limsup Xp, 
are all random variables. 


Example 4.6 (Spring 2017, # 4). Let X,Y be independent, and suppose that each has a uniform 
distribution on (0,1). Let Z := min(X,Y). Find the density fz(z) for Z. 


15 


4.2 Independence 


Definition 4.7 (Independence). If (X;)ier are random variables on the probability space (Q, F, P), 
we say that the sequence is independent if (F,,);cy are independent. Here, we write F,, = {X;1(A) : 
A € B(R)}. 

The next is a key characterization concerning the independence of random variables: 
(X;)ier are independent if whenever 7,...,i, € I are distinct, 














P(X, < Ditek Ge < Li.) =P EX: < Xi,) °° P(X, < Lin) 5 











for any Z;,,...,2;, ER. 


nr 





Proposition 4.8. If X,,...,X, are independent random variables and g1,...,9n are Borel mea- 
surable functions, then gi(X1),.--,9n(Xn) are also independent. 


Proposition 4.9. If X,Y are independent random variables with both E|X], E[Y|] < oo, then 
E||XY|] < 00 and E[XY] = E|X]E[Y]. 


Remark 4.10 (Orthogonal Random Variables). Random variables X,Y such that E[XY] = 
E\|X|E[Y] are called orthogonal. It is clear from the above that independent random variables 
with finite expectation are orthogonal. We also have that if X1,...,X, are random variables with 
finite variance, and if X,,...,X,, are pairwise orthogonal, then 


Var(X1 + ---+X,) = Var(X,) +--- + Var(X,). 


Example 4.11 (Spring 2018, #3). Let X,T be random variables with E|X], E[Y] < co. Prove 
that if ELX|Y]=Y and E[Y|X] =X, then X =Y as. 


Example 4.12 (Fall 2018, #2). Suppose Xj,..., X,, are iid random variables such that P (X; = +1) = 
P(X; =—1) =1/2. Let S, := X, +--+ +X, for 1 <k <n. Prove that 


P ( max Sy <*) = IP(S,> 2) + Pits, =). 


1<k 


Example 4.13 (Spring 2019, #8). Assume that X1, X2,...,X, are iid with density f(x) = 5 for 
x > 1 and 0 otherwise. Define 


1 
My, == marl X7,9/2 X32 65 4/0 Xa): 
n 
Show that M,, converges in distribution and find the limit. 


(Hint: Look at the distribution function and take the logarithm of the product where 1+$+- . +4 ~ 
log(n) + y and log(1 — x) & —x for 0 <a <1.) 


4.3. Conditional expectation 


Most importantly, we must remember the property that 
































E(E[X | Y]] = Ex], 


and learn where to apply this phenomenon. We also have the following consequences of conditional 
expectation: 














e EIVIY]=Y; 
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e If g,f are Borel functions such that g(Z), f(X) are integrable, then EE[g(Z)f(X)|Z] = 
Elg(Z) f(X)]. 


Example 4.14 (Fall 2018, #8). Let X and Y be two independent and positive random variables 
with respective density Fx, fy, and let g : (0,00) — (0,00) be a bounded Borel function. Find 


el(>)P 


the conditional expectation of g(X/Y) given Y, and then infer that V := X/Y has a density that 
you will identify. 

















Example 4.15 (Fall 2018, #9). Let X,Y,Z be random variables such that (X,7Z) and (Y,Z) are 
identically distributed. Let f : R — R be a Borel function such that f(X) is integrable. 




















(i) Show that E[f(X)|Z] = E[f(Y)|Z], a.s.; and 


(ii) Let 7,,...,Z;, be iid random variables with finite first moment, and let T = 7, +---+Th. 


Using (i), show that 


T 
E{T,|T| = — 


(Hint: If (T,)n>1 are iid and T, := T,; +---+ Tp, then each of (T,,T),...,(Zn,Tn) are also iid.) 


4.4 Common distributions 


Definition 4.16. We introduce models for some of the most common discrete probability distribu- 
tions below. In each of the following cases, X : Q — R (or possibly R") is a random variable, and 
it’s probability distribution, or measure function which is what varies from case to case, is denoted 
by pox: 


























e Bernoulli: For a fixed probability p € (0,1), we say that X ~ Bernoulli(p) if P(X = 1) = 
p and P(X =0) =1-—p. Here, we express the measure function by zx = pd; + (1 — p)do, 


where 
‘| A; 
5,(A) _ », AEA; 
0, aA, 


is the Dirac delta function. Indeed, we can see that this expansion is correct since 1) 
jx is additive (because the 6;’s are); and 2) it is a probability measure as wx(R) = 
poi (R) + (1 — p)do(R) = 1. It is standard information that E[X] = p, and Var(X) = pq, 
where g = 1 — p is the flipped probability for the distribution. 

Sums of Bernoullt random variables: Note that a sum of n iid Bernoulli random variables 
with parameter p have mean ps = np and variance np(1—p). In particular, if (X,,)n>1 are 
iid with X, ~ Bernoulli(p), then S, := X,; +---+X, ~ Binomial(n, p). 

Flipping the inequality in large deviations: If X; ~ Ber(1/2) and Y, := 1 — X;, then also 
Y; ~ Ber(1/2). Thus 






































1 
lim — log P (5, < an) = —I(1-a),a < 1/2. 














no nN 
e Uniform: X ~ Uniform (on a finite, or countable set) if S = {x,,...,v,} and we 
have that px({a1}) = ++ = ux({z,}) = +. Alternately, we can state this condition as 
X :Q — R is uniform on S' provided that P(X = 2) =---=P(X =2,) = 2. 


17 


e Geometric: X ~ Geom(p), which models 
X := # flips in a sequence of coin tosses until the first H occurs. 
We have: P(X =k) = p(1 — p)*~! for any natural numbers k > 1. 


e Binomial: X ~ Binomial(n,p), where X models X := # H’s in n coin flips, and is such 
that P(X =k) = (Z)p*(1 —p)"™* for 0 < k <n. We can show that E[X] =: = np, and 
Var(X) = np(1 — p) = npg. The factorial moments are given by E|(X),] = (n),p". 

e Poisson: X ~ Poisson(A), for A > 0, if for all integers k > 0: P(X =k) = eae, 
The Poisson distribution naturally arises as the limit of a binomial distribution when 
the expected number of successes stays fixed at A: namely, the limiting distribution of 
X ~ Binomial(n, A/n) as n > oo. It can be verified that if X is Poisson with parameter 
A > 0, then E[X] = A, E[X(X — 1)] = 4’, and Var(X) = E[X?] — E[X]? = ». The 
factorial moments are given by E[(X),] = A’. 

Sums of Poisson random variables: The sum of Poisson random variables is another Pois- 
son random variable. Namely, if (X,,)n>1 are random variables such that X; ~ Poisson(),), 
then S,, := X,+-+-+X,, ~ Poisson ()7i"_, \;). A special case is that the sum of n Poisson(1) 
random variables has the distribution Poisson(n), and the corresponding characteristic 
function ypn(t) = e"'-) for t € R. 














Example 4.17 (Preview of Continuous Random Variable Distributions). X is a continuous random 
variable if its probability measure, zx, has a consistent density function. Namely, f satisfying 
f >0, Je f =1, and where px(A) = J, f(x)dx. The following listing provides a brief overview of 
the properties of several of the most common cases we will see: 


e Exponential: With the construction of this model distribution, we have that the density 
function is non-zero, and equal to f(x) = \e~>*, iff and only if x > 0. It is defined 
to be zero otherwise. Here, to ensure that we get a resulting probability measure, the 
requirement is that ’ > 0, for example as in other instances in the next sections, with 
A := 1. Note that since we have that P(X >t) = e-*" Vt > 0, we can compute that 


E|X] = / Are dr = * 


E[X?] = = 
Var(X) = E[X?] — ELXP = = 
E[X"] = wu 


The exponential distribution has the so-called loss-of-memory property, which is to say 
that for all s,t > 0, P(X >s4+t|X >s)=P(X > 2). 

Sums of exponential random variables: Suppose that (Y;,)c>1 are iid with Y; ~ Exp(A). 
Then the CDF of the sum S, := Y; +---+ Y, is given by 

















n-1 (At)* 
P(¥ij+--+¥,<t)=1l-e*%)> il bce Rso. 
k=1 , 


Suppose that X; ~ Exp(,;) for \; > 0 andi = 1,2. Then if Z := X1 + Xo, the PDF of Z 
is given by (for x > 0) 


pein eC He a ee 
Mae, Mt = dr2 =x. 
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e Uniform: A.k.a., the non-discretized version of a uniformly distributed random variable 
on some [a,b] C R. Then the density and cumulative density functions respectively have 


the formulas: 
fay fee ase 














0, otherwise, 
0, u<a; 

Ea S . : Slash), 
ie TU. 


(b—a) 
12 


po (ge cater neon 


0, otherwise. 





Here, we have that w = S(a + b) and 0? = , where the density function can be 


rewritten as 


We also have that the n” moments are given by 


n 


1 
Blea Sar’ 


k=0 





e Normal and Standard Normal: f(z) = -:e~*’/2. See Section 7 for a substantially 
J2n 


complete treatment. Note that X; ~ N(0,1) => Y:= diz Lux, ~ N(0,a7+---+ 02). 
If X; ~ ane *), then Y := D0) Xi ~ Nu t+-+- +n, of +--+ 0%). If X; ~ N(0,1), then 
Zn = Xi +- an a has the same distribution as /nZ. If Y ~ Nu, 0?) and Z ~ N(0,1), 
hed noo: pando ee 


e Gamma: For parameters A, a > 0, we say that X ~ Gamma(A, @) if / rE wae 


for x > 0. Here, the gamma function is defined by the usual integral I'(a@ = 4 Ole Maa. 


We have the following properties of compositite (sum) distributions a result in gamma- 
distributed random variables: 





(G.0) If a=1 then Gamma(A, 1) = Exp()) is the exponential distribution; 

(G.1) If Xy,...,X, are iid with X, ~ Exp(A), then X, +--- +X, ~ Gamma(), n); 

(G.2) More generally, if X,,...,X, are iid random variables with X; ~ Gamma(A, a;), 
then X; +---+ X, ~ Gamma(A, a; +--++a@,)’ 

(G.3) If Z ~ N(0, 1) is standard normal, then Z? ~ Gamma (4, 4); 

(G.4) If n > 1 is an integer then the distribution Gamma (5, %) is called the x?- 


29 
distribution with n degrees of freedom, which also corresponds to the distribution 


of Z? +---+ Z? where Z,,...,Z, are iid standard normal random variables. 


e Product distribution: For two random variables X,Y let Z := XY. Then the respective 
PDF and CDF of Z are given by 


j= | tot (2) 
= f° tele ae fyly saya +f fx(x fy (y)dydz. 


If X,Y are both square integrable, then 
Var(XY) = (o% + wx)(oy + My) — xy. 
The characteristic function of Z is given by 


bz(t) = Ey|[ox(tY)] = Ex[¢y(tX)]. 
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Examples and exercises on exponential random variables 


Exercise 4.18. Show that if X ~ Exp(A = 1) and U := e* that U ~ Uniform(0, 1), i-e., prove 
that the two random variables U,e~* have the same distribution. 


Solution to Exercise (I) — Compute matching moments of each rv. We know by a fact that the n‘” 
moment of the uniform distribution on |a, b] is given by the following equation for Y ~ Uniform(a, b): 


n 


1 
BY" ae 


n+ 1<~ 





This implies that the n‘” moment for n > 1 of Uniform(0, 1) is given by ae Thus it suffices to show 


h 


that the random variable U defined above has n‘*” moments precisely equal to a for alln > 1. 


To find the expectations E[U"] = E{e~"*], we can use the composition formula which states that 
E[¢(X)] = { ¢(x)ux (dz). In particular, we have that 


7 reed = | NeW tM dey 
0 0 


ae 1 


E\U") 


r 
- er (ntA)e 
n+A 

















when A := 1. 














Remark 4.19. Note that if the integer-valued moments E[X™] for m > 0 of two random variables 
each match, then so too must the characteristic functions of each rv for all t € R, and also the 
moment generating functions (MGF's), M(t), must match for t where it converges. This follows 
from the fact that the characteristic function of arv X, yx(t), has a Taylor series expansion in at 
about 0 with 




















(i) 
| 0 
[xi] = £8) iso. 


re 











Solution to Exercise (II) - Compare chracteristic functions of each rv. We can compute the char- 
acteristic functions of each of X ~ Exp(1) and e~* and show that they are the same pointwise for 
allt € R. Thus by Lévy’s continuity theorem that they have the same distrbutions. Let’s do the 
computations, including integrating by parts to compute yy(t): 


























x (t) = Retx = | et® e—% dap 
0 

















1 
= | edu (ur+1—e*) 
0 

















at 


Solution to Exercise (III) — Compute the respective distribution functions of each rv. First, we have 
that for u € [0,1], the distribution function of e~” is given by 


F.-u(u) = P(e" <u) = F_y(logu) = 1-logu. 
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x 


Now since —u € [—1,0], under the change of variables x := logu ++ —u = e~*, we have that 


x € [0, co]. Thus we obtain that 


l-e“, u>Q, 
rota) =f u<0 











In other words, the distrbution functions match for all uw € R, and so the two random variables 
have the same distribution. 

















Exercise 4.20. For yu, > 0, suppose that S,T are independent rvs with S ~ Exp(A) and T ~ 
Exp(). Set V := min(S,T). Prove that (a) V ~ Exp(\ + y); and (b) P(V = S) = ~~ 


A+ 


Exercise 4.21. For fixed n > 1 and 1 <i < n, suppose that TJ; ~ ; for 4; > 0. Set V, := 
min(7},T>,...,7;,), and let J denote the (random) index of the T;’s which is smallest. Prove that 
(a) P(V >t) = exp(—(A, + --- + A,)t); (b) PU =7%) = ; and (c) that J and V are 


ri 
: ME An? 
independent. 


Example 4.22 (Spring 2019, #5). If X,, Xo,---,X,» are iid exponential random variables with 
parameter 1, compute the almost sure limit of 


ii n 
= S eX 2XK41 3 Xk 42 
v 
n 
k=1 


as n tends to infinity. 


More practice problems with other common distributions 


Example 4.23 (Spring 2017, #8). Assume X,, Xo,... are iid standard normal random variables. 
Show that for any A > 1/2, 





Example 4.24 (Fall 2016, #2). Let X be a random variable with continuous density function f 
and f(0) > 0. Let Y be a random variable with 


v= + if X > 0; 
0, otherwise, 


and let Y;, Yo,... be iid with distribution equal to that of Y. What is the value of the almost sure 
limit: 
Oe 5 a tae ee 
lin. ———___—_ 


noo n 


(See the second part of this argument.) 


Example 4.25 (Spring 2016, #3). Let X,, X2,... be iid uniform random variables on (0,1). Show 
that 
Ce a 


converges almost surely as n — oo, and compute this limit. 


yal 


4.5 Expectation 


Definition 4.26. If X = X,, then E[|X] = P(A). We also use the notation E[X, A] := E[XX 4] 
to denote the expectation of X restricted to A. Any random variable X is called simple if X = 
eee aiX 4, Where the A,’s are disjoint. This implies that E[X] = So aP(A;). If X > 0, then 
E[X| = J XdP. If X is an arbitrary random variable, then X = X,— X_ where X, = max(X,0) 
and X_ = max(—X,0). Here we have that |X| = X;+ X_ (cf. total variation of a function). We 
say that X is integrable if E[X4] < oo and E|X_] < oo. 


Proposition 4.27 (Tail-Sum Formula for Expectation). If Y > 0 is a continuous random variable, 
then 














RY] = [ew > tldt. 


Lemma 4.28 (Necessary Condition for Finiteness of Expectation). Suppose that X is a random 
variable. Then 














[|X|] <0o <=> SOPLX Sn] <c. 


nel 


We make the following observations: 


e Note that for X simple, E|X] does not depend on the representation of X. That is, if 
X= 4X, = pF bjXB,> then )), a:P(A;) = Se b;P(B;). 


e If X > 0, 4 a sequence (X,,)n>1 of simple random variables such that X,, 7 X and 





lim E[X,] = ELX], 


n—-Ooo 
where this limit is possibly infinite (cf. the monotone convergence theorem). 


e To define a general random variable X > 0 from simple functions, use a limiting procedure: 
2 
i ee POY mi cyek, so that. Xx, 37 X and lim, 4.5 X_ SX. 


n 





Expectation of a Discrete Random Variable: Suppose that X :Q — {aj,a2,a3,...} is a 
discrete random variable. Then 


E[X] =) /a;-P(X =a,). 


j2i 


Proposition 4.29. The random variable is integrable <= > E||X|| < ov, i.e., X is integrable if 
and only if |X| is integrable. 


Proposition 4.30 (Properties of Expectation). We have the following key properties: 


(1) E[X]>0 if X >0. 


(2.1) If X > 0, then E[X]=0 — > X =0 almost surely. 








(2.2) If X is any random variable taking values on (—oo,0o) such that X is square integrable. 
Then. X=0' ts, — EX] EX?) =0- 


























(3) ElaX + BY] = aE[X] + BEL). 


(4) (Monotone convergence theorem) If X, >0 and X, ZA X, then E|X,| 7 E|X] 
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(5) |E[X]| < E[|X]]. 
(6) If X,Y are independent and integrable, then XY is integrable, and E|XY] = E|X|E[Y]. 

















(7.1) If X,Y are such that for all bounded, smooth functions ¢,W : R— R, E[é(X)v(Y)] = 
ElO(X)JElW(Y)], then X,Y are independent. 





(7.2) E[XY)?? < ELX?JE[Y?). 
(8) (Fatou’s Lemma) If X, > 0, then E |liminf,,.. Xn] < liminf,_,.. E[Xy]. 


(9) (Dominated Convergence Theorem) If |X,| < Y for alln > 1 with Y integrable, and if 
X;, —> X, then E[X,] > E[X]. 


(10) Ther moments are given by a sum over the factorial moments and the Stirling numbers 
of the second kind as 
“.(r 
BX |= { he xX 


Theorem 4.31. If X has distribution ux on R, then 














BIX]= f zux(de). 


Thus to compute E|X| we only need to know the distribution of X. Also, Ed = f o(x)px(dr) 
for any Borel measurable function ¢. That is, if fx(x) is the Penna a OF 2X, a 


Elo(X)] = JP. Fx(2)o(a)de. 


Proposition 4.32 (Key Lemmas and Inequalities). We have the following key named lemmas and 
inequalities: 














(A) (Fubini’s Theorem) Let (Q, F,P) := (Qi x D2, F, ® Fo, P, ® Po). Suppose that F: QR 
is integrable with Fi,,(we) : —F'(w1,w2). Then 


[fr (w1, W2) Pi (dw) P(dw2) = [([* w (We) Po( (dus) P,(dw)) 
ur 


5 (w1) )Py( (dw) P2(dw)). 


(B) (Markov’s Inequality) If X > 0, then P(X > A) < +E[X]. Indeed, we see that 


P(X > A) = PU{X > A}) = ElXxs)] 


That is, if X > 0 is integrable, then the tail probability P(X > ) decays at least as 1/X 
for X large. 


(C) (Chebyshev) If X? is integrable and \ > 0, P(|X — E[X]| > A) < = . The moral here is 
that if X? is integrable, then the tail probability P(X > A) decays like for large X > 0. 


(D) If X © L?, then P(|X| >») < &. Indeed, since VA > 0: P(|X| > A) = P(|X|? > A”) < 
XP. E[|X/ 
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(E) If E[e?'*!] < 00 for some a > 0, then P(|X| > A) < Ce~™ for all \ > 0. 














(F) (Jensen’s Inequality) Let X = (X1,...,Xn) : Q 4 D with D C R" a conver set. If 
@: D> R is convex such that @(X) is integrable, and if X is integrable, then E|¢(X)] > 
o(E[X)). 

Special cases: The function (x) = |x|? is convex for p > 1. Then by Jensen’s inequality: 
E||X|?| > |E[X]|?. Other convex functions include: (x) := |2|,e”, — log x, +. 














(G) (Portmanteau’s theorem) Suppose that |X, —X|— 0 in probability (and hence in disti- 
bution). For any bounded continuous f : RR, one has that E|f(|Xn— X|)] > f(0). 




















Proposition 4.33 (Generalized Chebyshev Inequality). Let f : [(0,00) — [0, 00) be a non-decreasing 
Borel function, and let X > 0 be a non-negative random variable. Then for alla > 0, 


PUL2 8) S37 


provided that f(X) is integrable. 


Example 4.34 (Other forms of the Chebyshev inequality). Provided that Var(X) = 0? < 00, with 
EX = wand any ke > 0, we have that 














1 


P(lr—pl > ko) <5 (4.1a) 
C2 
P(le-nlze)< 5 (4.1b) 
2 
P(je—p| <e)>1-2. (4.1¢) 
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Proposition 4.35 (Second moment method). If X has finite variance and X > 0, then 





E(x? 
E[X?] 





P(X >0)> 

















If there are positive constants c1,C2 > 0 such that E[X?] < c E[X]? and E[X] > co, then V0 € (0,1): 


1-0) 


Cy 





P(X > cv) > ( 


Example 4.36 (Spring 2019, #1). Let X be a non-negative random variable such that 0 < E[X] < 
+oo, and let 0 < x < 1. Show that 


Example 4.37 (Spring 2015, #1). Assume X is a symmetric random variable such that E[X?] = 1 


and E[X*] = 2. Show that 


14 
P(X >l1)<—. 
( 2YsS5 


Example 4.38 (Spring 2015, #5). for a sequence, X), X2,..., of random variables, suppose that 
we know that 
S " nE||Xnl] < 00. 


n>1 


Show that the sequence Y, = X, + Xn41 +--+: + Xjon converges almost surely and in L! to 0. 
(See for uses of Borel-Cantelli to guarantee a.s. convergence.) 
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Example 4.39 (Fall 2019, #10). Let X,Y,Z be independent and uniformly distributed on (0, 1]. 
Compute E[min(X, Y, Z)]. 


Theorem 4.40. Jf X > 0, then 














E[X] = ie P(X > Ajdd. 


Corollary 4.41. Jf P(X > A) < 





Proof. We have that 


BIX]= [PUK > dr 


cma Gy 1 
cf gmat f PX > nar 


C 
<—4+1<o. 
E 

















for x € R. Then 








Example 4.42. Let X ~ Cauchy so that fx(x) = 





es 
ie dx of dx 
a} lta? aJy 142? 
-=[- ax 
oe rd 14+ 22’ 
=f <= f dx <= dx 
Tx 20 as. Dae ~ AE fg oF 


2 


which implies that AP(|X| > A) is oe But E||X|| = +00! 


P(|X| 2 A) 





where 





This implies that 


Theorem 4.43 (Kolmogorov’s inequality). Let (Xz)m>1 be independent random variables with 
E|X;] =0 and Var(X;,) < oo. Then for X > 0, we have 


Var(S, 
P (px ISil2) <“S =e Pa 


Example 4.44 (Fall 2018, #6). Let X1,...,Xn,... be identically distributed (but, not necessarily 
independent) random variables with finite first moment. Is the following, 





n'E [max Pal — 0, 


as n —> oo, true or false? 
(Uses dominated convergence.) 


Example 4.45 (Fall 2016, #7). Let X be a random variable taking values on the interval [1, 2]. 
Find sharp lower and upper estimates on the quantity E|X JE[=]. Provide an example of a random 
variable for which each of the lower and upper estimates are obtained. 

Hints: 


e mix (eb): 
e ab < +(a+b)?. 
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4.6 Variance and covariance 


Definition 4.46. We define the variance of a random variable X to be 


Var(X) = E[X?] — E[X]*, ox := VVar(X). 


For two random variables X,Y we define their covariance to be Cov(X, Y) := E[(X — E[X])(Y — 
E|Y])] so that Cov(X, X) = Var(X). If X = (Xq,...,Xn) is a random vector, then we define its 
corresponding covariance matriz Covx to be the n x n symmetric matrix whose (ij) entry is given 
by Cov(X;, X;). We also define the correlation of X and Y to be Corr(X, Y) = Cov(X, Y)/oxay. 


Proposition 4.47 (Properties). We have the next properties of the variance and covariance: 


(1) Var(X) > 0 with equality iff X is constant almost surely. 





(2) Var(X + b) = Var(X), VbE R. 














(3) For alla € R, Var(aX) = a? Var(X). 











(4) Cov(X + Z,Y) = Cov(X,Y) + Cov(Z, Y) and Cov(X,Y) = Cov(Yy, X). 


(5) For any random variables X,Y, —oxoy < Cov(X,Y) <oxoy, where equality on the right 
is attained iff Y = aX + B for some a > 0; and on the left iff Y = —aX + 6 for some 
a> 0. 


(6) Corr(X, Y) = Corr(aX + b,cY +d) for alla,c# 0. 
(7) If X,Y are integrable and independent, then E|XY|= E|X|E|Y| — > Cov(X,Y) =0. 


(8) If X1,...,Xn are independent and in L?, then Var(ayX1 + +++ + dnXn) = aj Var(X1) + 
+++ a? Var(Xn). 


(9) Var(14) = P(A) (1— P(A). 
(10) Cov(1,,1g) =P(ANB)—P(A)P(B). 


4.7 Uniform integrability 


Definition 4.48. The sequence (X;)ic7 is uniformly integrable if 


sup E||X;|, |X;| > R] — 0, 
iel 


as R > oo. The interpretation of this definition is that the tails of the expectation are controlled 

uniformly. 

Proposition 4.49. If (X;)ic; are tid and X, is integrable, then (X;) is uniformly integrable, i.e., 
E|| Xi, |X| = R] = El] Xi, |Xi] = R] — 0, 


as R-> om. 


Proof. For a fixed R > 0, let Yr := IA IXpxyor: Then by dominated convergence, VR > 0, 
lYr| < |Xi| —> Yr-0as R- oo. This in turn implies that E[Yp] > 0 as R - oo. 
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Proposition 4.50. [f4 Y such that |X;| < Y for alli > 1 with Y integrable, then (X;) is uniformly 
integrable. 


Proof. Since {|X;| > R} C {Y > R}, we have the argument that 


EV|X;|,|X;| > RB] < ElY,|Xi| > RB] < E[Y,Y > RB] 30, 











as R — oo (by integrability). So sup,<; E||X;i|, |Xi| > R] + 0 as R > ov, as claimed. 





Proposition 4.51. Jf sup;e; E||Xi|°] < oo for some a > 1, then (X;)ier is uniformly integrable. 


Proof. We see that 





E\|Xi|, |XG] = Rl = El|AGX xysrl S £ 





Se a-—1 
|X;| ( 7) Xision 


sang | REE E|X;|*] 
E |—— Se 
as pei X1XilZR = Pests? 


which implies that 
E|Xa|& 
sup E[|X;|,|Xi| > R] < sup El|Xi|*] en 
ie PS ee ome 











as R > ov. 





Example 4.52 (Spring 2019, #4). Let (Xn)n>1 be a sequence of non-negative uniformly integrable 
random variables sucg that, as n — +00, X, = > X. Show that X is integrable and that 
lity 65 | Xn) =X). 

(Note: The given proof is very careful with respect to tying up loose ends with analysis. Uses 
Fatou’s lemma.) 

(Fact: By uniform integrability, (E-X,,)n>1 is bounded!) 


4.8 Convergence and expectation 


First, some preliminary notes: we recall the subset containment properties of limsups of sets given 
in Proposition 1.5. Also, we restate the precise definition of an exponential random variable here 
for clarity of exposition as: X ~ Exp(A) is exponential is wx has a density function of the form 


Kent ge 0: 
0, x <0. 


Now we arrive at the focus of this subsection which is to dig into the next problem on iid exponential 
random variables: 
Problem Statement: We claim that if X1,...,Xp,... are iid with X, ~ Exp(1), then 


Xn 
P (iim sup = i) = 1 
n—0o log n 


We notice that a proof of this result is equivalent to showing that 








(A) P lim Slips cu ae >1+ 5| = 0, Vd > 0; and then similarly that 











(@) P [Him SUP sac ae >1- 5| =1,Vd>0. 


rH 


Note next that 


ye X, 
li a 1+6>1] ">1-6;=[)A 
{ imsup nen \ () { + im sup ee \ () ks 








where 


A 1+ : sili Xn S 1 : 
= — > lim —— —— 
i Eo a en kf’ 


for k > 1. Also, we have that A, \y {lim SUP ps0 ae = \. Moving along, we can see that 


1 
nito-e 





P Xn a 9 +5-¢] =e e (1+d—«) logn = 


log n 
= oP| 


n>1 





Xn 1 
a> i+8—e| = 2 gies ife <0. 


So in summary, to complete the proof of 4, we use Borel-Cantelli in the form of 


Xn Xeni 
P |< limsup >1+4+6>] CP |limsup , —— >1+06>] =0. 
n—0o logn n— 00 log n 


To show ®, we need to consider lower bounds: Ve > 0, 








Xn XS 
Himsup { > 1—d-+e} < flimsup > 1+}. 
og n 


noo nooo logn 
To use the second part of Borel-Cantelli on independent random variables, we want that 
Bes i) 
Yop [Ft > 1-d-+e] =o ae =H 
wel. 


which happens if ¢ < 6. Then this implies that 


Di 
P [limsup { 2. > 145-2} =. tore a6 == 2B 
og n 


n—-> co 


Example 4.53 (Spring 2018, #1). Let (Xn)n>1 be a sequence of iid random variables with X, ~ 
Exp()), i-e., for t > 0 we have that P(X, >t) =e“ where \ > 0. Prove that 


n 





lim sup 
noo ©6108 


< 00, a.s. 


(Hint: Think easy application of Borel-Cantelli.) 


4.9 Characteristic functions of a random variable 


Definition 4.54 (Characteristic Function). The characteristic function of a random varible X is 
the function ¢x :R — C defined by ¢x(t) := Ele’*'). 














We have the following properties of the characteristic function @x of any random variable X: 
e We have that ¢x(0) = 1 and for all t: |¢x(t)| = |E[e’*4]| < Elje’**|] = 1; 
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e The characteristic function ¢x(t) is a uniformly continuous function of t (for any t € R. 











e If Y =aX +b for constants a,b € R, then 
y(t) — fle rare —_ elt x (at). 





e The function Mx(t) = e is called the moment generating function of X. Unlike 
the characteristic function, this GF does not necessarily exist for all t € R. For those t 
such that M(t) does converge, we have the relation that ¢x(t) = Mx (ut); 














e We have that for k > 0 (not necessarily an integer), 














e If X has a density f, then 


which is the Fourier transform of f at —t; 


e If two random variables have the same characteristic function, then they have the same 
underlying distribution; 


e If X,,...,X, are independent random variables, then 


PX 4--4Xpy(t) = Ox, (t) +++ Ox, (t). 


e If X ~ N(0,1), then ¢x(t) = e~”/?. Similarly, if Y ~ N(u,0?), then Y has the same 
distribution as aX + py, and hence, dy(t) = ellit e— (ot)? /2. 


e If the characteristic functions of X, converge pointwise to that of X, then X, => X 
(weakly). 


e If X,Y are independent, then the characteristic function of X + Y equals the product of 
their characteristic functions: ¢(u) = ¢x(u)dy(u). 


e If X),...,X, are iid and S, := *+-+*., then dg, (t) = ox, (4)". 


n 

















Distribution | Ch. Func MGF 

Ber(p) 1—p+ pe" 1—p+pe' 
Binom(n,p) | (1—p+pe")" | (1— p+ pe)" 
Poisson(A) exp (A(e” —1)) | exp (A(e’ — 1)) 
Uniform(a, b) a [er — et] Ce [e* — e] 
Exp(X) (L$) (3) 

N(0, 1) emul? et /2 

N(O, o?) en (ot)?/2 e(ot)?/2 

N(u, o?) eut—(at)?/2 ett(ot)?/2 























Proposition 4.55 (Moments of a Random Variable). Let @ be the characteristic function of X. If 
E||X|] < 00, then ¢ is continuously differentiable, and 


$(0) = 1B IX). 
If E||X|"] < co for some positive integer k > 1, then d has k continuous derivatives and 


¢(0) =P E[X!], VO< 5 <k. 
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Theorem 4.56 (Inversion). The PDF of X, fx(x), is given by the inversion formula 


a= ~ i eth (t)dt. 


Similarly, if g is a Borel function such that g(X) is integrable, then 


Sla(X)] = 5 f Glox(at 
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where 


We also have that 


U 


w{o fel > ups f (-o(syat 


=U 











e If EX? < o, then ast > 0 





























d(t) =1+#tEX — —EX? + o(t’). 














e If limsupa\o zz [(A) — 26(0) + ¢(—h)] > —0«, then E|X|? < oo. 


Examples involving characteristic functions 


Example 4.57 (Spring 2018, #9). Find an example of a random variable with a density function, 
but whose characteristic function @x satisfies 


[- lextoiae = 


Example 4.58 (Fall 2018, #4). Let @ be the characteristic function of a random variable X. Show 
that 


W(t) = [o(¢)|2, Volt) =+ [ots 


are also characteristic functions. 


(Hint: [> E(e**"|U = u)du = EE(e**¥|U).) 


Example 4.59 (Spring 2017, #1). Show that if X, and Y, are independent for n = 1,2,... and 
Xn, => X and Y, => Y, where X,Y are independent, then X,+Y, => X+Y. 




















ce 














Example 4.60 (Spring 2017, #5). Show that the characteristic function ¢ of a random variable is 
real if and only if X and —X have the same distribution. 


Example 4.61 (Fall 2016, #5). Let £1,& be independent random variables with respective char- 
acteristic functions ; ; 
— Ww +1 
di(u) = Tae fl) =F ange 
Find the probability that €, + € takes values in (3, 00). 


Example 4.62 (Spring 2016, #5). Let (Nz)zs0 be a rate-A Poisson process. Let X1, Xo,... be iid 
random variables with E[LX,] < oo, and define 


Ni 
= S- X;. 
1=1 


Show that S;,/t converges in probability to a constant and compute this constant. 
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4.10 The Levy continuity theorem 


We assume the following hypotheses: 


e For (X;)n>1 a sequence of random variables, we define the respective characteristic func- 
tions by ¢,,(t) := Efe™*"]; 























e Suppose that ¢,(t) > ¢(t) pointwise for all t € R. 





Then the Levy continuity theorem states that TFAE: 


e X, = X, i-e., the cdfs converge at every point of continuity; 


e The sequence (X;,)n>1 is so-called tight: 
lim (sup Pix > i) = 0; 
LOCO n 


e The function ¢(t) is the characteristic function of some (limiting) random variable X; 
e ¢(t) is continuous in ¢; 


e (t) is continuous at t := 0. 


ol 


5 Modes of convergence 


5.1 Definitions of convergence 





Definition 5.1 (Convergence Everywhere). Let X,,X :Q— R. Then X, > X if X,(w) — X(w) 
as n — oo for allw € 22. 











Definition 5.2 (Almost Sure Convergence). We say that an event A C Q is almost sure if P(A) = 1, 
while noting that this property is distinguished from “provably certain” in so much as there may 
be other outcomes in the sample space which lie outside of A, but these events are in zero measure 
subsets of the field. Now, we write X, “> X if P({w : X,(w) > X(w)}) = 1. Equivalently, we 
define that X, ““> X if JA € F an almost sure event such that X,(w) > X(w), Vw € A. We have 
almost sure convergence if 





lim sup X, = liminf X,, 


n—0o N00 


in which case we may write that 


xX = lim. XxX, =limsup X,, = lim inf X,. 


Noo n—-oo noo 


Proposition 5.3. X, “> X <= Ve > 0, P(limsup,,,..{|Xn —X|>e}) = 0. An important 
corollary is that if Ve > 0 Dips, P (|Xn — X| > €) < 00, then Xp, SPX. (ex) 


Definition 5.4 (Convergence in Probability). We write X,, *, X if Ve > 0, P(|X, —X|><¢) 70. 
Note that convergence in probability implies finiteness (a.e.) almost surely. 


Definition 5.5 (Convergence in Mean, a.k.a., L? Convergence). We write X;, See eit E||Xn - 
X|?] + 0 as n > oo. 


Definition 5.6 (Weak Convergence, a.k.a., Convergence in Distribution). A sequence of random 
variables (X;,)n>1 converges in distribution, or converges weakly, to X if 


lim Fy, (x) = F x(x), 


N—->oco 











: : : ; ; D : 
for every x € R at which F'y is continuous. In this case, we write X, — X, or sometimes also 
dist 


Xn —> X. 





Proposition 5.7 (Equivalent Conditions for Convergence in Distribution). For any sequence (Xn)n>1 
of random variables, we have that Xn, x if and only iff any of the following conditions are met: 


(a) E\f(Xn)| — El f(X)], for all bounded, continuous functions f ; 


(b) Elf (Xn)]| — Elf(X)], for all bounded, Lipschitz functions f. Here, we say that a func- 
tion f : R > R is Lipschitz, or 1-Lipschitz, if there exists a constant K > 0 which is 
independent of x,y € R such that Va, y: |f(x) — f(y)| < K- |x — yl; 
































(c) limsup,_,..{P (Xn € C)} < P(X € C), for all closed sets CCQ. 


Example 5.8 (Fall 2018, #7). Let X,, X2,... be iid random variables with common characteristic 
function ¢, and let S$, = S°7_, Xz. Show that if ¢ is differentiable at 0 with ¢/(0) = a, then, as 
n — co, S,/n — p in probability. 

Hints: Proof uses Lévy continuity theorem. Always we have ¢(0) = 1, and here we have that 


$/(0) = lim, 4.0 SE. 
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Example 5.9 (Fall 2016, #8). Show that for a sequence of random variables X,, one has that 
Xe, La ei probability if and only if 


E ae ee ot 1] a 0, 


as 2 — OO. 
Hints: 1) See the first argument; 2) The reverse direction uses Markov. 


Example 5.10 (Spring 2016, #4). Let X,, Xo,... be iid exponential variables with parameter 
A := 1, and set 
Mya Xa ese aXe) 











Find sequences (a), (bn) C R such that (MM, — a,)/b, converges in distribution. 





Example 5.11 (Spring 2015, #6). Assume (X,,)n>1 is a sequence of iid random variables with 
mean 0 and variance 1. Show that 
— Y¥nXyt+ Vn —1X2+-+++ Xn 


ay ) 
n 





converges weakly (in distribution) to a normal N(0, 1/2). 


Example 5.12 (Spring 2015, #7). Assume that (U;,)n>1 is a sequence of uniform random vari- 
ables on [0,1]. Let V, := max(U1,U3,...,U"). Show that (1 — V,,) log(n) converges weakly (in 
distribution) to an exponential random variable with parameter \ = 1. 


5.2 Remarks on the equivalence of studying convergence to zero 


Proposition 5.13 (Convergence to Zero). If (Yn)n>1 is any sequence of random variables, then 
Y, “50 <— > Ve > 0, P(limsup|Y,| >¢) = 0. And then as a consequence, X, “> X <—> 
Ve > 0, P(limsup |X, — X| > ¢) =0. (*) 


Remark 5.14 (Notes on Convergence to Zero). We notice that it is sufficient to study convergence 
of a random variable to zero since we can always replace the expression X, “> X by Y, := 
X, —X “*5 0. Next, we have the observation that for any sequence (an)n>1, 


Qn —>0 <=> limsup |a,| = 0 
noo 
<=> inf supa, = 0 
N21 k>n 
<> Ve>0,4N-: sup ap < e. 
k>Ne 





Remark 5.15 (Borel-Cantelli as a Practical Tool for Proving Convergence). As a tool for proving 
almost sure convergence, we can apply Borel-Cantelli: If Ve > 0, 07,5; P(|Xn — X| > €) < co, then 


X, “5 X. Note that the convergence of the above sum, at. minimum requires that P(|X, —X| > 
€) +0 as n— oo for all e > 0. We also point out the following key observations: 


e TfVe > 0: dons P(|Xn — X| > €) < 00, then P (lim sup{|X, — X| > e}) =0, for all e > 0; 


© Tf noi P (Xn — X| > €) < 00, Ve > 0, then X, "5 X, where we consider P (|X, — X| > ¢) 
to be the tail probability which we must estimate for n large. 
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5.3 Flowcharts of implications: convergence hieracrhchy 


We have the following map of convergence implications: 


i? iPS 
—> — 3 a 
sore 


a.s. P Dist 
—-: JS —> == 


(Hierarchy of the Modes of Convergence) 


We compare this to the real analysis modes summary chart reproduced as follows from Heil’s notes: 










Pointwise a.e. 
convergence 


¥ (if |E| < 20) 


L norm 
convergence 


M7 


convergence pointwise ace. 


in measure 


— 


convergence of 


a subsequence 














Proposition 5.16 (Almost Everywhere Convergence of a Subsequence). Convergence in probability 
implies that there exists a subsequence (kn) of the original sequence which almost surely converges: 


A{kn}nsi CN: Xn 5X => X,, Xx. 


Disproof: Countererample to Convergence in Probability Implies Almost Sure Convergence. We will 
demonstrate a sequence of random variables that converges in probability to zero on [0,1], but which 
does not converge almost surely to zero. For n > 1 and 1 <j <n, define 


and set Xn(z) :=Xzg, ,(x). Let ¢ > 0 and observe that 





P({a € [0,1]: fn(a) > €}) =P({@ € [0,1]: fr(@) = 1) = [Sng 





On the other hand, given any x € [0,1] there are infinitely-many n € N such that x € S),j, ie., 


such that X,,(x) = 1. This implies that there is a subsequence {X,,,} such that X;, *; 1 and hence 
X,, ~ 0 almost surely on [0, 1]. 














Proof: Existence of an Almost Surely Convergent Subsequence. Suppose that X, *, X. We need 
to find a subsequence {X,, }x51 such that X,, “> X as k > oo for all w € A where A C 2 is some 
large enough event in the sample space such that P(A) = 1. For j > 1, choose L; such that for all 
kL; 


P({u: |X — X|(a) > 1/9}) < 5 
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We can just as well assume here that the L,’s satisfy: L; < Ly < L3 <---. For j > 1, we define 


By = {w |X, — X|(w) > 1/3}, 


Z ea = () J 5. 


m>1lj>m 


and set 


Now for m > 1, we can see that 


r@<e(Us ))< rw) < S024 =2'™ 5 0, 


j>m jam jzm 


as m — oo. So we conclude that P(Z) = 0. Next, ifw Ee \ Z, then w ¢ UjsmE; for some m > 1. 
This implies that « ¢ E; for all 7 > m. Thus |X;, — X|(w) < 1/7 for 7 => m, which implies that 
limjo0 Xz,(w) = X(w) for all x € Q\ Z. So it suffices to take X,,, := Xz, so that X,, —> X, 
or equivalently with A := 2 \ Z so that X,,,(w) —> X(w) for all w € A where, as proved above, 
P(A) = PO) =P) = 1f=0=1. 














5.4 Convergence theorems (analogs to real analysis) 


Theorem 5.17 (Lebesgue’s BCT). Let f, be integrable Yn > 1 such that (a) fr *. f; or (b) 
fn <> f for some measurable function f. If |fn(x)| < g(x) a.e. for all n where g is integrable, 
then f is integrable and 


tim. f Wa — Fld = 0. 

nN—- Oo 
Theorem 5.18. /f f,g are measurable such that |f| < g and g is integrable, then f is integrable. 
Theorem 5.19 (Monotone convergence theorem). Let {fi }x>1 be a sequence of measurable func- 


tions on E. Then 


(1) If fe Af as. on E and Ay integrable such that f, > yp a.e. in E for allk > 1, then 


te [n= [ys 


(2) If fe \ f as. on E and Ay integrable such that fy < p Vk > 1, then 


in [= ft 


Theorem 5.20 (General form of LDCT). Let f, be measurable for all k > 1. Suppose that (a) 
limp 500 fz: = f a.e. in E; and (b) Ay integrable such that for allk > 1, |f,|<y ae. in FE. Then 


tn fine fi 


Lemma 5.21 (Fatou). Suppose that E C R” is measurable and let f, > 0 be measurable for all 


k>1. Then 
Hinwint f jee [( (lim inf fr) 
k-00 


We can obtain the same conclusion if we instead assume that the f, > y for all k > 1 when vp ts 
integrable on E. Notice that the statement of Fatou’s lemma makes no a priori assumptions on the 
convergence of the sequence { fi }k>1- 
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Theorem 5.22 (Uniform convergence theorems). We have the following variants of uniform con- 
vergence theorems: 


e fn — f uniformly on [a,b] with f, all continuous => f is continuous. 


e If f, is differentiable on [a,b] and limn+oo fn(%o) exists for some xo € [a,b] and f/ converge 
uniformly on [a,b], then fr, — f uniformly on [a,b] and f'(xz) = limn.o fi (x) for all 
x € [a,b]. 


e If fx is integrable on |a,b|] and f, > f uniformly, then f is integrable and 


f=lim | fy. 


[a,b] mre J [a,b] 


5.5 Properties of the modes of convergence for random variables 


Fact: If u(X) < oo, then L4(u) € L?() for any 0 < p< q< oo. This can be proved by applying 
Holder with the conjugate exponents | pp := q/p and qo := q/(q — p) |. 














Proposition 5.23 (Cauchy-Schwarz and Holder inequalities). Note that we assume that the func- 
tions f,g are both square integrable, i.e., |f\?,|g\? are both integrable. If this is the case then, 


if Vl< ie rt) : 
ols (0) (fon 


Note that the latter equation above is the special case of the more general cases in Holder’s inequality 
for +7=1 <> p+q=pq when p=q=2: 


[29] < tlle lle 





Dual (conjugate) exponents: | Ifp+q= pq, then p=q/(q—1). 











Note that Minkowski’s theorem states that ||f + 9||p < ||fllp + I|gllp>- Minkowski also implies that 


| os [itll 


Proof of Minkowski. We let p be arbitrary with f,g € L? and observe that: 





Ie tolp= frtor= fears +a 


< < | if + 9PP-X(F| + [gl)du, then we apply Hélder 


1-1/p 
< (lll + llglle) (i If + gl) 


wrangle 
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If a > 0, we can obtain that 


w({e:|f(2)| > a}) < (Ain " 


Another important result which is non-trivial to prove and due to Riesz is that for 1 <q < oo and 
ge L*: 


1 1 
jolle=sun{] f fo|:Ifle= 1}, whore ++ 2 = 
P @q 
Example 5.24 (Dense Class Arguments). Compactly supported continous functions are dense in 


L': C.[a,b| = L*{a,b|. Other dense functions in L? are the simple (i.e., staircase) functions, and 
polynomials. If f € L? and C is dense in L” then Ve > 0, dg € C such that ||f — g||, <. 





Proposition 5.25. Provided that the probability space is complete: 


e If Xy P.X and Xn Xs Y, then X =Y almost surely. 


e If Xn “*. X and X, —> Y, then X =Y almost surely. 


e If Xn /. X and An Bs Y, then X =Y almost surely. 





e If Xn =n be x Y, then for any a,b € R, aX, + bY, ae bY: and XnYn En Oa 














e IfX, > X,Y, > Y, then for any a,b € R, aX,+bY, —> aX+bY, and X,Y, > XY. 














° Oe ex Y, 3 Y, then for any a,beE R, aX, + bY, & aX +bY. 





e None of the above statements are true for weak convergence in distribution. 


e Convergence in distribution to a constant implies convergence in probability. 
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6 Laws of large numbers 


6.1 The weak law of large numbers 


Example 6.1 (Key Observations). Suppose that (X;);+1 have the same L? norm and that E[X;] = 0 
for all 2 > 1. Then: 


e Var (A t*n) =F | AabebXa |"), which implies that 


2 


Var 








(Ae) -|_->—* 
n 


1 
5 <= (Xl? +--+ + [1 Xall?) = [allie 








L2 
e If X; =---=X,, then Var (4%) = Var(X)). 


e If X,,..., Xp are iid, then 





or ee 1 
Vv Cov(X;, Xj) V ee cr 
ar ( rs )=-5 Ov =8 ar(X. = ar(X1) 
Theorem 6.2 (WLLN). If.X1,...,Xn are iid with mean ys and variance 0, then S, := *4=+*» A, 


[Lb as n> OO. 


Proof. First, we show the statement of L? convergence. Since = E[S;,]: 
- - o2 
E[(Sn — #)?] = Var(S,,) = — - 0, 
n 


as n — oo. So S, a pt. Next, for all e > 0, 


P(|Sn — pI] > €) < 

















as n —> oo. Hence, S;, Ea jh, 
Theorem 6.3 (WLLN, V2). Suppose that X,,...,Xp are tid with mean pw and E[X}] < oo. Then 


= Las. 
RS ease 


Proof. Since we can replace X; by Y; := X; — p and bound the resulting norm of this random 
variable, we can assume that w = 0. Using Markov, 


P(|S,| >) = P(S4 > 4) < 





Thus S,, *, 0. And by Borel-Cantelli, we have that 


C 
> PlSal 2 €) < Da <m, 
n>1 


n>1 











and so S;, > 0. 
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Theorem 6.4. Jf X1,...,X, are independent with mean ps and are uniformly integrable, then 
1 
Sn 2. LL. 
Example 6.5 (Spring 2019, #3). Assume that {X,,}n>1 are random variables such that 
1. E[X,] =0 and E[X?2] < 1 for any n > 1; 


2. E|X;X;] <0 for any t F 7. 


Show that for any sequence {@,}n>1 C [1/2, 2], 








A,X, + agXq+-+-+AnXn Pp 
—> 0 
a, +d. +:::+An noo 
(Hints: Use Chebyshev.) 
6.2 Large deviations 
If X1,..., Xp are iid with mean 0 and E[e'*+!] < 00 for some a > 0, then 


PCS: | Svea De, 
where the rate function I(<) is given by 


I(e) = sup {e\ — log Ele**'}} . 
ER 


The function f(A) := log E[e**"] is a convex function with 





f(0) = 0, f(0) = B[Xy), f"(0) = ae asia 


Also, the rate function is convex and satisfies 


1(0) = 0, 1'(0) =0,1"(0) = x = 





We remark that if (a) < 00, then S,, ”visits” (a —¢,a+¢) with positive probability. 


Theorem 6.6. The moment generating function of a random variable X is defined to be M(X) := 
Ele**]. Define 
I(a) := sup {ad — log M(\)}. 


ER 
Then 


(1) P(S, > a) <e™™, for a> p; 
(2) P(Sn Sa) <e-M), for a < p; 


(3) We have that 


ne 


P(|S, —al <e)> (=) e ME@)teG(a)) 


where the functions F,G are explicit in terms of p,1,M. 
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Theorem 6.7 (WLLN, V3). If (X;)i>1 are uncorrelated with mean p such that 0” = sup,s, Var(X;) < 


= P,L? 
oo, then S, —— pb as n > oo. 


Theorem 6.8 (WLLN, First Improvement, V4). [f (X;)is1 are iid, wp = E[X;], and E[X7] < co, 


= P,L4,a.s. 
then Sy ——— pp as n> co. 


Proof. This proof is instructive. We assume that we have = 0, as otherwise we can take X= 
X; — wand notice that ||X;||z4 < w+ ||Xil|z4 < oo. Next, we can expand 


n 


ES.) = TEX +-+++X,)4] = = S° EX; Xj;X,X1. 


4 
i,j,k e=1 

Observe that if one of the indices 7, 7, k, 2 appears by itself in the previous expansion, then E|X;X;X,X¢] = 

0 (by independence). The surviving terms are of the forms (ijk@), (ijk), (ijke) and i= j =k =. 

In the first three cases, we obtain terms of E[X?]?, and in the last case we obtain E[X/]. This 

implies that 


BSS] = (8n(n — 1) [XP] + nB[e4)). 


Now since Var(X?) = E[X7] — E[X?]* = 0, we have that 





7 1 3E[X4] 
4 4 1 
E[S,] < a (3n(n — 1) 4+ n)E[X7] < a 
Then this implies that 
z Z 3E[X?] 
4 4 1 
P(|Sn| 22) = PS, 2¢) < Geet — 0, 











as n — oo. And also, by Borel-Cantelli, we have that 5,, “> 0 as n > oo. 





Remark 6.9 (Observations and Extensions). If (X;) are independent, E[X;| = 0 for all i > 
> 4 as. 
1, and sup,>; E[X/] < oo, then S, Ss TE (X;i)i>1 are independent, E[X;] = 0 Vi, and 


= 2k a.s. = 
suP;s1 E[X?*) < 00, then S, OEE ee fact, we can show that P(|S,| >¢) < oo where the 
constant C; depends only on the fixed k > 1. 


6.3. The strong law of large numbers 
Theorem 6.10 (SLLN). Suppose that X,,...,X;, are tid. Then S, “> pas n — oc if and only if 
E||Xi|| < co and E||Xi|] = wu. 


Remark 6.11 (Finite Expectations and Forward Direction Proof Steps). Recall that 


El|X|]<0o = > S°PI|X| > n] < oo. 


n>1 


This is because we know that - 
B(x) =f PUX| > ale 
0 


And as we can see by taking unit-width rectangles, i.e., to show that 





fA) +s@)+--< [ flayae < F0) FU) +fQte—, 


40 


we have that when f is non-increasing (for all large enough x > 1): 


[fear <0 —— Y= f(n) < 00. 


n>1 


= Sn-Sn-1 “*, 0). To see this we look at 


Now, secondly, we also observe that if a "su, then “s 
nN—- Oo 





the limiting behavior of Xn = Su — (2+) Sn —> 0. So we conclude that xn “*. 0. 


Claim: This observation implies that X, is integrable. 


Proof. Suppose that in fact X, is not integrable. Then we have that }7,., P[|X1| > n] = +00. So 
in fact since the X;’s are independent, by Borel-Cantelli (II), P [lim sup,_,..{|Xn| > n}] = 1. But 
this contradicts our fact proven above that xn > 01. 














Finally, in summary if E[|X1|] < oo, we have proved that a ft, with pp = E||X4|]. Also, though 
we have that 22 “> ji, so we obtain that in fact this jg = E[|X,|]. This concludes the proof of 
the forward direction of the SLLN. 


Lemma 6.12 (Real Analysis Fact). [f }0,,51 “* < 00, then immediately “**"" — 0. 


Proof. By hypothesis, we have that S, 5, asn — 00. Now if S, = , @, then z, = (S,—S,-1)-n. 
So 




















‘ a 1 
Ly + n = = 7 ($1 — So + 2(S2 — 52) +--+ + 20Sn — Sn-1)) 
1 
= 0: Dy ae ae nt 15x) 
= 5,— 4 Pn —>S5:=0. 


Theorem 6.13 (Kolmogorov). Jf Y, is a sequence of independent random variables such that 
den>i Var(Yn) < 00, then Sin>1 Yn is a.s. convergent. If Yn = x» satisfies eet Wann) G5; 


n2 





ben. Sogea nw ElXn] < oo. As a consequence, if X1,---,Xn are tid with mean ps and finite vari- 
ance, then Sy “> ys. 


As a comment, if Var(X1) < oo, we have already proved that 


2 Var(X1) 


Soop 


P |S, — pe] > €] a 


but in this case it’s NOT Borel-Cantelli that provides the a.s. convergence. 


Remark 6.14. To finish the sketch of the reverse direction of the SLLN, we next sketch the last 
remaining key ideas to this part of the proof. In particular, we can 1) use Borel-Cantelli to show 
the following: 


S > Pn # Xn) = SO P[Xnl > mn] = SO PUK > 2] Pllimsup{Xn # Y,}] =0, 


ie., for all large enough n, X,, = Y;, (a.s.) — so that 


Xyt---+Xy a.s. Y,;+---+¥Y% a.s. 
SSS 
n 





n 


Then 2) we can use Kolomogorov’s theorem on the variance to show that 4+ > pi. 


Al 

















Corollary 6.15 (Consequences of the SLLN). Jf X1,...,Xn are tid, then for any f : RR 
which is measurable and bounded, we have that >", LX) — E|f(X;)|. For example, suppose that 
F(©) = Xja,y(2). Then tN Xa, (Xi) 4+ ElXjapj(41)] = Pla < X1 < 6], where the former (non- 
limiting) sum corresponds to the relative frequence of X; in [a,b]. Now for the special case where 
a= —oo and b= x, we obtain that 





1 n 
7 OKA) —> Fx,(x) = PLXy < g]. 
i=1 


Example 6.16 (Spring 2018, #2). Suppose that f is a continuous function on [0,1]. Use the Law 
of Large Numbers to prove that 


1 1 
li eat ++ an)¥") dr, +++ dtm = (z). 
tin, foo fF (Gate!) dso den = F (3 


(Hints: Use continuity and dominated convergence in the end.) 


Example 6.17 (Fall 2018, #1). Use the SLLN to find the following limit: 


1 ry ae 
lim vee “dz, ---dt&py. 
noo fg 9 L1 e+ Ly 


Example 6.18 (Spring 2017, #3). Let X1, X2,... be iid random variables uniformly distributed 
on [0,1]. Show that with probability 1, 














hit (XX 


n—-oo 
exists and compute its value. 


Example 6.19 (Spring 2015, #8). Let (Xn)n>1 be an iid sequence of positive random variables 
such that E[.X1] < oo. Let 
N,; :=sup{n: Xy+---+ Xp < th. 


Show that 
Nt a.s. 1 


t nc EX)’ 


where the convergence is in the almost sure sense. 


Example 6.20 (Fall 2019, #6). Assume (X;,)n>1 is a iid sequence of positive random variables. 


Show that sa ae én 
ee “51 <> X,=1as. 


n noo 





6.4 Concentration inequalities 
Theorem 6.21. Jf Xj,...,Xn are tid such that a < X; <6 for alli, then 
PS. =p) Se) ee ee. 


Also, notice that 





(b- a)? 


Ri 
Var[Xi] < 7 


(ba)? 
a 

















Proof Notes. We require the following lemma: If a < X < b, then Var(X) < 
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Proposition 6.22. A function f :R" — R ts 1-Lipschitz if 





(aay ty =F ise Ua) < 

















Suppose that Xy,...,Xn are tid, X; ~ N(0,1), and that f is 1-Lipschitz on R" such that f is 
differentiable and supzcrn |V f(x)| <1. Then 


P(|f(a1,-..,2n) — Elf (a1,..-,¢n)]| > 2) Se". 


Example 6.23 (Cauchy-Schwarz and 1-Lipschitz Functions). (Important.) An example of a 1- 


Lipschitz function f is given by f(x1,..., 0%) = 2+". To show this, we can pull a trick out of 


our hat in the form of ” forcing” the Cauchy-Schwarz inequality in the following form: 


In our case, we have that 





(@) -F@| < Slew = Ad < 


Example 6.24. If X,,...,X, are iid with X; ~ N(0,1), we get that for f(a71,...,%)) := walt -- 


-+++ 2p): 


And if we take \ := ./ne, then 











Bx ae >.) 206" 


r(yox 


i=l 
A 

P ( 
n 


Example 6.25 (Spring 2018, #6). Let (Xn)n>1 be a sequence of iid random variables with 
ih 
LG =1)= 5 = P(X, =-1). 


Let (Yn)n>1 be a bounded sequence of random variables such that P(Y, 4 Xn) < e~". Show that 








= va) <2e-"7/2, VA > 0. 


yx 


i=1 








= -) < de-N /2. 


1 

=H aaa Yn) | 
as 1 — OO. 
Example 6.26 (Spring 2017, #2; Spring 2016, #1). Let X be a random variable with mean zero 
and finite variance o?. Prove that for every c > 0, 

2 

(HINT: Combine the inequality E[e — X] < E[(e— X)Xx~,] with the Cauchy-Schwarz inequality.) 
(HINT: Write c— X = (c— X), —(c— X)_ and then use Cauchy-Schwarz.) (+) 
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7 Normal distributions 


7.1 One-dimensional case 


Definition 7.1 (Normal and Standard Normal Distrubutions). We say that Z ~ N(0,1) and 
X ~ N(, 07) if their probability density functions satisfy 











1 
fe(z) = =e*"?, ER 


1 


V 210? 








eae" xeER. 














One key important fact relating these two normal distributions is that if X ~ N(,07), then the 
random variable Z := *=“ ~ N(0,1). In particular, given this definition, Fz(z) = Fx(oz +) and 


oO 


fz(z) =0- fx(oz +p). 


Proposition 7.2 (Properties of the Standard Normal). We have the following properties of a 
standard normal random variable Z ~ N(0, 1): 


(1) E|Z] =0 and Var(Z) = 1; 
(2) Mz(A) = Ele?) = e’/? for all X € C. 


Proof of (1). First, we compute that 


E[Z] = is zfz(z)dz = fe Tae ae = 0; 


since the right-hand-side integral is of an odd function over a symmetric interval. Next, we can 
compute by repeated integration by parts that 


1 oo 
Var(Z) = al en F2dz 
1 = 2 
ei a _ pp ia) d 
z& € z 
V 20 [. ( 


1 ae 2 
—2°/2 
+ = e€ dz 
V2T We 

















Proof of (2). We have that 


ve er 2 
Mz(A) = Ele’? -/ —e* dz 
z( ) oo VOR 
2 oo 
_ e /2 en 3 (22-2Az+0?) 
V 27 Joo 
=)? 














Corollary 7.3. If X ~ N(,07), then its moment generating function is given by Mx(A) = 
Ap+(Ao)? /2 
e€ : 
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Proof. We can write X = ~+o0Z, where Z ~ N(0,1). This implies that 





Mx(A) = Ele**] = Ele+e2)] = Ble] = deb r20?/2 











Note that the moments of a normal Z ~ N(ju,07) are given by 


E[z"| = 0, if n is odd; 
~— Jot. (n—1)!! = 0"(n — 1)(n —3)---3-1,_ ifn is even. 


Example 7.4 (Fall 2018, #3). Let (Z,)n>1 be iid standard normal random variables, and let 
(Qn)n>1 C R. Prove that }7 51 @nZi << 00 = > Visi Gn < ©. 














Example 7.5 (Spring 2016, #6). Let X,, X2,... be iid standard normal random variables, and for 


x € (-1,1), set 
VS) OX, 


n>1 


Show that the sum defining Y converges and find its distribution. 
(See this example showing the sum is convergent using Borel-Cantelli. ) 


7.2. Multidimensional normal distributions 


Definition 7.6. The random variable Z := (Z1,...,Zq) is a standard multidimensional normal 
vector if Z,...,Zq are iid with Z, ~ N(0,1). We write that X = (X,...,Xa) ~ N(u,C) with 
pw € R¢ and covariance matrix C if ! 




















1 1 
Pe Lignano eo (CoN (an), 2-H) /2 
= \/ (2m) 4(det CY! 


One particularly nice property relating these two types of multidimensional normal variables is that 
X~N(p,C) = Z=C-V2(X —p) ~ N(0,Id) = X =p+C'2Z with Z ~ N(0,Id). 


Exercise 7.7. Show that if X = (X4,...,Xn) ~ N(w,C), then Ci; = Cov(X;, X;) and E[LX;] = ty. 


Proposition 7.8 (Properties and Transformations). We have the following properties and trans- 
formation results relating multidimensional normal vectors: 


(1) X~N(u,C) = X=p4+CV?Z where Z =(Z,,...,Za) for Z iid and Z, ~ N(0,1). 





(2) If X ~ N(u,C), A: R? > R™ is a linear map, then Y := AX ~ N(Ap, ACA‘). 





























(8) If X ~ N(p,C), then for any ai,...,a4 € R, ae a;X;~ N 68 Asti, 4 gmt Cijaia). 
(4) If Z ~~ N(0,Id) and U is an orthogonal matrix, then Z := UZ ~ N(0,Id). 


Sketch of (2). We first cite the following theorem: If Mx(A) = My(A) for all ||Allo < € (Ve > 0), 
then X ~ Y have the same multidimensional distrubution. 
Step 1: Notice that 

Mx(A) = Efe**)] =o o [ex pec : 





‘Note that if C ¢ C24 is symmetric, C > 0, then C = UDU? where U is orthogonal and D is a diagonal matrix 
of eigenvalues of C. In particular, this shows that C!/2 = UD!/?U™ in the computations from the last definitions. 
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Step 2: We compute that 


Max(A) — E [etme = E jenna 


= elAT AM +(CATA,ATA) 


= e(AAn)+(ACATA,A) 














= Myyap,acaty(A). 
Example 7.9. Let X := [X1, X2]" ~ N(u,C) where X = uy +C'/?Z for Z ~ N(0,Id). Suppose 


that _ 
_ |Covx, 0 
oo 0 a. , 


where Covx,.x, =0 = > X,, Xo are normal and independent. Then 


Z Cov? Z 
censor [2] ans [oH], 
: 22 Covy, Zs 





so that X; = 1; Covy? Z; are independent with the Z; independent. 


Theorem 7.10 (Concentration Inequality). If Z,...,Z4 are tid with Z, ~ N(0,1), then for any 
1-Lipschitz function f : R4 + R, we have that 




















P(|f(Z, rr) Za) —- Elf(Z, ee Za)\|) < CL P e7 One? 


where the sharp second constant is given by Cz = 1/2. 


Proof Components. We cite Jensen’s inequality which states that for @ convex: E[¢(X)] > ¢(E[X}). 
We also use the identity that Ele’ | Ele] = B[erP4)-f42))), 


























Example 7.11 (Fall 2016, #4). Let (X,Y) be a normal vector in R? with mean zero and covariance 


matrix: 
5 61 
Pe al ' 


Example 7.12 (Spring 2016, #2). Let X = (X,, X2) be a Gaussian vector with zero mean and 
covariance matrix 
_|l p 
= ; d 


where |p| < 1. Find a matrix A such that X = AZ where Z is a standard normal vector, and derive 
the characteristic function of X as a function of p. 


Find B[X2Y2]. 


Example 7.13 (Spring 2015, #3). Assume that (X,Y) is a joint normal vector with E|X] = 
E{Y]| = 0. Show that 
E[|X?Y?] > ELX°)E(Y?], 


with equality if and only if X,Y are independent. 


Remark 7.14 (Notes on the Examples). If we assume that (X,Y) is a joint normal vector with 
mean E[X] = E[Y] = 0 and variance o”, then the covariance matrix for the pair can be written as 


This implies a formula for the characteristic function of (X,Y) given by 


(Eo)? + (no)? + “eit 
; 





frcv(€n) = Ble = exp (- 
One usage of this characteristic function would be to compute the expectations 


1 d* 


BY = hae 


fxy(&,7) 





€=n=0 


AT 


8 Central limit theorem 


Theorem 8.1 (Central Limit Theorem, V1). Let X1,...,Xn,... be independent iid random vari- 
ables with mean ys and finite variance 07. If —oo <a <b < ov, then 


peer On e 
lim P (« eo Bisa DE = ) = / et Pe. 
n—0o o/n a V20 


where o = \/Var(X1). This means that there is convergence of the summation of rvs to a normal 
random variable with mean 4 =0, AND, the convergence is weak to N(0,1). 





Theorem 8.2 (Central Limit Theorem, V2). Let X1,...,Xn,... be independent iid random vari- 


ables with mean p and finite variance o?. Let S,,:= ~++42+=+%» Then 


dist 


Jn (S,— 1) + N(0,0°), 
asn —-> O. 


Example 8.3 (Spring 2018, #4). Let X1,...,X,, be iid random variables with mean jz and variance 
ao? < oo. Let f be a function continuously differentiable at the point ys. Prove that the sequence of 


random variables 7 - 
vi (7 (2 *=**2) — ju), 





n 


converges in distribution to a normal random variable. What is the mean and the variance of the 
limit? (HINT: Use the first-degree Taylor approximation of f about ju.) 


Proof. If we define Y, := ./n (4++=+* — 1), then Y,, os. Y, where Y ~ N(0,07) (by the Central 
Limit Theorem). Also, by the SLLN Y,,/./n 20); 














Example 8.4 (Spring 2018, #5). Let Xy,...,Xpn,... be iid random variables with E[X,] = 0 and 
Var(X1) = 1. Define S,, := X, +---+X,. Prove that 


Sn 


lim sup = +00. 


Noo Jn 


(Uses Kolmogorov’s 0-1 law.) 


Example 8.5 (Spring 2017, #6). Let (X;) be iid random variables which are uniformly distributed 
on [0,2]. Let 5S, := X1 +---+ X,. Show that 


V8 1/6 (YS, - Yn) — “2, 
where Z is a standard normal random variable. 


Example 8.6 (Spring 2016, #7). Let X1, X2,... be independent random variables such that X, ~ 
Bernoulli(p,) for some p, > 0. Show that if np,(1 — pn) > oo, then 


Xn — Pn 


SS NG: 
Mn = Py) 


(Uses Lindeberg’s CLT.) 
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Example 8.7 (Spring 2015, #2). Assume that X,, X2,... is a sequence of iid random variables 
such that for some a < 1/2, 





Xy t+ + Xp a.s. 
popecrmeer aa 
no noo 
for some real number m. Show that almost surely X; = 0. 
(Try to show that both X;,X? are integrable, then derive a contradiction.) 











Theorem 8.8 (Berry-Esseen). Suppose that (Xn)n>1 are tid random variables with EX, = 0, 
EX? = 07? > 0 and E|X,|? = p < 0. Set Y, :-= *++* and let F, be the CDF of /n- Yn/o. 
Then for all x,n, if ®(x) is the distribution function of the standard normal, we have that there is 
an absolute constant C' > 0 such that 





























Fala) — ®(2) < ere 





8.1 Extra problems with proofs and/or hints 





Exercise 8.9. Let (X;)n>1 be iid random variables with X,, > 1, EX, = 1 and Var(X,) = 0? € 
(0,00). Show that 











2(/'Sn — Vn) — N(0, 1), 


asn— ©. 


Proof Sketch. First, notice that 
Dre dae 


2(/S, — Vn) = We 


Now when |S, — n| < n?/°, an event that happens with probability tending to one as n — 00, we 


have that 
i 1 1 d 
we ele ara 
3 1 1 
(n—neA)I2 pi? 


n—n2/3 208 


n4/3 
= 2(n — n2/3)3/2 > 0, 


Sn da — (Sn = 7) 
n Va vn 

















IA 











as n —> oo. Now the only catch is to see that by the CLT, ==" =» N(0,1) as n — oo. 









































Vn 
Exercise 8.10. Let (Xj)n>1 be iid random variables with EX, = 0 and E[X?2] = o? € (0,00). 
Show that 
Miti+%s 3, 
(Age ag) 
as nN — 00. 


Proof Sketch. By the WLLN, in probability we have that 


Xp +s + Xp Pp 
No 
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Since the function y~!/? is continuous at one, we also have that 


ae 


no? 





Now since convergence in probability implies convergence in distribution, and since by the CLT 


ee = > N(0,1), we see the claim is true. 














50 


9 Useful inequalities and asymptotic estimates 


9.1 Factorials, binomial coefficients and Stirling’s inequality 


We have that 





nim van (*)" (14 eee ro(n)) 


12n  288n? 


(:) ~ ence a 


We also have the inequalities that provide 





nk n ne ne\k 
—< < — 9.1 
Te()Sq G te) 
Qan- ne" <n! <e/n-n"e” (9.1b) 
Vann neem <n! < V2an-n"e Nera, (9.1c) 


Inequalities on exponential functions and integrals 


We have that as n > oo, 


For w > 0, we have 











IA 


; l@|rr4 Dia" 
min : ‘ 
(n+1)!" nl! 


t2 o X 3 
E G —1+ 5%°)| <E nin aS : | ex?) 


t 3 X 3 
E in (A ex) (Lxj<e + Lisi) 









































Other inequalities 


e If |z;|,|wj;|< vv V1 <i,7 <n, then 


n 


n 
tm — [| tm 
=1 m=1 








n 
-1 
a" y lZm — Wm|- 
m=1 


m 
e |e! — 1/? <4. 


log (1 
e lim,+0 es(it#) =a 
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10 Misc key facts, results, and other reminders 


e If (A;);>1 are almost sure events, then A = M;>1 A; is an almost sure event, i-e., if P(A;) = 1, 
then P (Mi>1 Aj) =I. 


e If X is independent of itself, then X is almost surely constant. One consequence of this is 
that if X; are independent and a,X,+---+a4,X, = 7, then X; is constant almost surely, 
as X, is equal to an expression independent of itself. 


@:{SUpyaig. SoS te ey: 


e For 3 ER, |e” — 1] < |¥|; 














e Variations of Inclusion-Exclusion: For any two sets E, F CQ, we can write: 


(A) Xgur =Xe+Xer- Xmnri and 
(B) P(EU F) =P(E)+P(F) -P(ENF), 


where P(E F) = P(E)-P(F) iff £, F are independent. Also, for any three sets A, B,C: 
P(AUBUC) = P(A)+P(B)+P(C)—-P(AnN B)—-P(ANC)—-P(BNC)+P(ANBNC). 


e Special sequences of sets constructions: A, := {w : |X(w)| > n}; 
e Special choices of functions: Take f, = + Xjo1), OV fr = 4 *Xton3 


. Ja = b| = meted aed 


ec+y=min(z,y) + max(z,y). 


© Xiratq(t) = Xp—eq(x), which can be combined with an application of Fubini or Tonelli to 
swap the orders of integration in a double or multiple integral. 


e Whenever see convergence in probability, try to extract a subsequence which converges 
almost surely. 


e Whenever see (i) positive g,; and (ii) gn > g a.e., immediately try Fatou’s lemma. 
e If cannot prove convergence in probability directly, think proof by contradiction. 

e Let A denote the symmetric difference of sets: AAB = (A\ B)U(B\ A). 

e E\ F=(E\ A)UA\ F). 

e |A\ El. < |AAF|.. 

© f(t) = Dj Xe(2) => SO f= Cpa Bil. 

e CF\CE=E\F and Ey = FU (Ex \ Fr). 


e Let BY := {4 € Ex: jf —1< |a| < j} = (Exo {a: |2| < 9}) \ fx: |z| <j —1}. Then 
EG ) is bounded and measurable. May need to assume that |E;| < oo for some k (the other 
case is easier). 


e BAtlas 





e If |E| < oo, then X,(zx) € L'(R). 
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Continuous on # = > measurable on LE. 


WG: fn — f+ lgn — gl 2 €}) S we: [fr — Fl 2 €/2h) + we : [Gn — gl 2 €/2}) 


int, Gy <li 465 By — ni sup;,.., ye — Sup, te 
where lim ints. = lim, .g1nl 434) 0; = SUD sa ll ase Way and 
iM SUPy 45 Ln = HMp-400 SUP dn Fm = Mfp>0 SUP Sn Tm: 


fie) =—-W, EM ¢P forp<g@. 


(a log x)!/4 


Let A, = {|f| > nm} and A := {|f| = oo}. Then A,41 C A, and A = f,,, An so that 
An \ A. By continuity, limn;+..|An| = |A| = 0 when f is finite a.e. 














f’ integrable ==> for all ¢ > 0, 46 > 0 such that for any measurable A C R: |A| <6 ==> 


jal lee: 


lim inf(s,,) + liminf(t,) < liminf(s, +t,) < limsup(s, + t,) < limsup(s,,) + lim sup(t,), 
for any sequences {s,} and {t,}. 





ab < 4(¢+46)? and ab < 4(a+6)?. 


141/2+--+1/n 


logtny) = 


14+1/24+---+1/n—(14+1/n) < log(n) < 14+ 1/2+4+---+1/n, so that 
as 1. —> 00. 


The characteristic function of a random variable with mean 0 and variance 1 satisfies 
f(€) =1- 2/2 + of?) =e CPE), 


If X,, X29 are independent, then the characteristic function of X; + X2 is the product of 
their characteristic functions: @(u) = ¢1(u)de(w). 


A consequence of Jensen’s inequality for expectation: E[X]E|1/X] > E[X]/E[X] = 1. 
Convergence in probability implies convergence in distribution. 
If the characteristic function of X, converges pointwise to that of X, then X, => X. 


By the CLT, Y, := n\/? (41#=+*2 — 1) converges in distribution to a normal random 
variables with mean 0 and variance 0°. 


By Cauchy-Schwarz, 





Oe Le eA 2 I 2 





IA 
ee 3 


~ VES, = Ta) P An) + —E(Sn — Ta)” Xag 


Using monotone convergence, 


SOP (X?7/c? >n)=E 


n>=0 


= (|= + i) <1+ SAX), 


In the Borel-Cantelli lemma, interpret P (limsup A,) as P(X € A, infinitely often). 


SX x2/c2>n 


n>0 








If X, =— > X (weak convergence), then liminf,... P(Xn >t) = P(X >t). 
If {X,}n>1 are uniformly integrable, then the sequence {E[X,,}n>1 is bounded. 
We have: log(1 — t) = —t + O(¢t?) for small t. 


P(P<X)<¥ P(Y <r<X). 


r rational 
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